Machine Learning for Streamflow Prediction: Current Status and Future Prospects

Poster at the AGU 2019 Fall Meeting presenting Martins work on a model comparison study for the Great Lakes/Lake Erie area.

Abstract

Accurate streamflow prediction is an open challenge in hydrology. We show that approaches based on machine learning can provide more accurate predictions than physically-based models and discuss potential for improvement in hybrid approaches.

The Great Lakes Runoff Intercomparison Projects for Lake Erie and the Great Lakes (GRIP-E/GRIP) establish standardized datasets to benchmark streamflow prediction models. In this context, we compare physically-based models with two machine-learned models driven purely by data: a gradient-boosted regression tree framework (XGBoost) and a neural network architecture. Following the GRIP-E intercomparison, we train our models on meteorological forcings of the Lake Erie watershed from 2010 to 2012 and test them on 2013 and 2014.

We find that both data-driven approaches outperform at least some physically-based models, such as the large-scale, semi-distributed Variable Infiltration Capacity model based on Grouped Response Units (VIC-GRU). Our XGBoost model is trained on temperature and precipitation of a fixed window of eight days and yields a median Nash–Sutcliffe Efficiency (NSE) of 0.52 across 46 gauging stations. Our neural network is a convolutional long short-term memory architecture that operates directly on the gridded forcing time series, achieving a median NSE of 0.35. The physically-based VIC-GRU model achieves a median NSE of 0.26.

Although currently the tree-based model is more accurate than the neural network, we consider further exploration of neural models worthwhile because they can operate directly on the gridded time series, potentially capturing temporal and spatial relationships. However, neural networks have a large number of parameters and thus require copious training data, but we expect predictions to ultimately beat tree-based models. Furthermore, preliminary experiments suggest that neural models may exhibit greater generalization ability to ungauged basins for which we did not supply training data.

Looking ahead, neural networks can combine time-series forcing input with static geophysical data such as soil maps. Taken together, we hope that neural networks can provide the foundation of hybrid approaches that both improve accuracy and allow for a better understanding of the physical processes underlying streamflow.

Link

Citation

@inproceedings{gauch2019agu,
  title={Machine Learning for Streamflow Prediction: Current Status and Future Prospects},
  author={Gauch, M. and Tang, R. and Mai, J. and Tolson, B. and Gharari, S. and Lin, J.},
  booktitle={AGU Fall Meeting 2019},
  venue={San Francisco, CA},
  year={2019},
  organization={AGU}
}